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Applicant 






SONY CORPORATION et al . 







This International Search Report has been prepared by this International Searching Authority and is transmitted to the applicant 
according to Article 18. A copy is being transmitted to the International Bureau. 
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This International Search Report consists of a total of . 

[X] It is also accompanied by a copy of each prior art document cited in this report. 



Basis of the report 

a. With regard to the language, the international search was carried out on the basis of the international application in the 
language in which it was filed, unless otherwise indicated under this item. 

I I the international search was carried out on the basis of a translation of the international application furnished to this 
Authority (Rule 23.1 (b)). 

b. With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the international search 
was carried out on the basis of the sequence listing : 
f J contained in the international application in written form. 

filed together with the international application in computer readable form, 
furnished subsequently to this Authority in written form, 
furnished subsequently to this Authority in computer readble form. 



2. 
3. 



□ 
□ 
□ 
□ 

□ 

□ 
□ 



the statement that the subsequently furnished written sequence listing does not go beyond the disclosure in the 
international application as filed has been furnished. 

the statement that the information recorded in computer readable form is identical to the written sequence listing has been 
furnished 

Certain claims were found unsearchable (See Box I). 
Unity of invention is lacking (see Box II). 



4. With regard to the title, 

[X| the text is approved as submitted by the applicant. 

| | the text has been established by this Authority to read as follows: 



5. With regard to the abstract, 
PT| the text is approved as submitted by the applicant. 

the text has been established, according to Rule 3 

within one month from the date of mailing of this international search report, submit comments to this Authority. 

6. The figure of the drawings to be published with the abstract is Figure No 



| — | the text has been established, according to Rule 38.2(b), by this Authority as it appears in Box III. The applicant may, 



as suggested by the applicant. £J None of the figures. 

[X] because the applicant failed to suggest a figure. 
[ | because this figure better characterizes the invention. 
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other means 

"P" document published prior to the inlernational filing date but 
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"T" later document published after the international filing date 
or priority date and not in conflict with the application but 
cited to understand the principle or theory underlying the 
invention 

•X' document of particular relevance; the claimed invention 
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involve an inventive step when the document is taken alone 
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(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 
International Bureau 

(43) International Publication Date 
20 September 2001 (20.09.2001) 




PCT 



(10) International Publication Number 

WO 01/69936 A3 



(51) International Patent Classification 7 : H04N 7/26, 

G06F 17/30, H04N 7/24, 7/50, 7/36 

(21) International Application Number: PCT/JP01/01982 

(22) International Filing Date: 13 March 2001 (13.03.2001) 

(25) Filing Language: English 

(26) Publication Language: English 



(30) Priority Data: 

2000-68720 
60/204,729 



1 3 March 2000 ( 1 3.03.2000) JP 
16 May 2000 (16.05.2000) US 



(71) Applicant (for all designated States except US): SONY 
CORPORATION [JP/JP]; 7-35, Kitashinagawa 6-chome, 
Shinagawa-ku, Tokyo 141-0001 (JP). 

(72) Inventor; and 

(75) Inventor/Applicant (for US only): KUHN, Peter 



fDE/JP]; c/o SONY CORPORATION, 7-35, Kitashina- 
gawa 6-chome, Shinagawa-ku, Tokyo 141-0001 (JP). 

(74) Agents: KOIKE, Akira et al.; No. 11 Mori Bldg., 6-4, 
Toranomon 2-chome, Mi nato-ku, Tokyo 105-0001 (JP). 

(81) Designated States (national): AU, CA, CN, JP, KR, US. 

(84) Designated States (regional): European patent (AT, BE, 
CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, 
NL, PT, SE, TR). 

Published: 

— with international search report 

(88) Date of publication of the international search report: 

28 February 2002 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 
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(57) Abstract: An audio/video (or audiovisual, "A/V") 
signal processing apparatus and method for extracting 
a compact representation of a multimedia description 
and transcoding hints metadata for transcoding 
between different (e.g., MPEG) compressed 
content representations, manipulating (e.g., MPEG 
compressed) bitstream parameters such as frame rate, 
bit rate, session size, quantization parameters, and 
picture coding type structure (e.g., group of pictures, 
or "GOP"), classifying A/V content, and retrieving 
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PCT 
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RECORD COPY 

(PCT Rule 24.2(a)) 



To: 



KOIKE, Akira 

No.11 Mori Bldg., 6-4, Toranomon 2- 
chome 

Minato-ku, Tokyo 105-0001 
JAPON 



Date of mailing (day/month/year) 
03 April 2001 (03.04.01) 


IMPORTANT NOTIFICATION 


Applicant's or agent's file reference 
SK01PCT25 


International application No. 
PCT/JP01/01982 



The applicant is hereby notified that the International Bureau has received the record copy of the international application as 
detailed below. 

Name(s) of the applicant(s) and State(s) for which they are applicants: 

SONY CORPORATION (for all designated States except US) 

KUHN, Peter (for US) 
International filing date : 13 March 2001 (13.03.01) 

Priority date(s) claimed 13 March 2000 (13.03.00) 

16 May 2000 (16.05.00) 



Date of receipt of the record copy 
by the International Bureau 

List of designated Offices 



26 March 2001 (26.03.01) 



EP iA^BEXH^DE^DK^ES^FI^FR^GB^RJEJT^U^MCNUPT^ETR 
National :AU,CA,CN,JP,KR,US 



ATTENTION 

The applicant should carefully check the data appearing in this Notification. In case of any discrepancy between these data 
and the indications in the international application, the applicant should immediately inform the International Bureau. 
In addition, the applicant's attention is drawn to the information contained in the Annex, relating to: 

| X | time limits for entry into the national phase 

| X [ confirmation of precautionary designations 

I j requirements regarding priority documents 

A copy of this Notification is being sent to the receiving Office and to the InternationalSearching Authority. 





Authorized officer: 




The International Bureau of WIPO 






34, chemin des Colombettes 


Y. KUWAH^RA^ 




1211 Geneva 20, Switzerland 






Facsimile No. (41-22)740.14.35 


Telephone No. (41-22)338.83.38 





Form PCT/IB/301 (July 1998) 
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ANNEX TO FORM PCT/IB/301 



International application No. 

PCT/JP01/01982 



INFORMATION ON TIME LIMITS FOR ENTERING THE NATIONAL PHASE 

The applicant is reminded that the "national phase" must be entered before each of the designated Offices indicated in the 
Notification of Receipt of Record Copy (Form PCT/IB/301 ) by paying national fees and furnishing translations, as prescr.bed by 
the applicable national laws. 

The time limit for performing these procedural acts is 20 MONTHS from the priority date or for ^l^^'f^}^ 
which the applicant elects in a demand for international preliminary examination or in a later election, 30 MONTHS from the 
priority date, provided that the election is made before the expiration of 19 months from the priority date. Some designated (or 
elected) Offices have fixed time limits which expire even later than 20 or 30 months from the priority date. In other Offices an 
extension of time or grace period, in some cases upon payment of an additional fee, is available. 

In addition to these procedural acts, the applicant may also have to comply with other special requirements applicable^ 
certain Offices. It is the applicant's responsibility to ensure that the necessary steps to enter ^. n ^^^.^^ 3 
timely fashion. Most designated Offices do not issue reminders to applicants in connection with the entry into the national 
phase. 

For detailed information about the procedural acts to be performed to enter the national phase before each °e»gnated 
Office, the applicable time limits and possible extensions of time or grace periods, and any other requirements seethe .recant 
Chapters of Volume II of the PCT Applicant's Guide. Information about the requ.rements for fil.ng a demand for international 
preliminary examination is set out in Chapter IX of Volume I of the PCT Applicant's Guide. 

GR and ES became bound by PCT Chapter II on 7 September 1996 and 6 September 1997. respectively and may therefore, 
be elected in a demand or a later election filed on or after 7 September 1996 and 6 September 1997. respectively, regardless of 
the filing date of the international application. (See second paragraph above.) 

Note that only an applicant who is a national or resident of a PCT Contracting State which is bound by Chapter II has 
the right to file a demand for international preliminary examination. 

CONFIRMATION OF PRECAUTIONARY DESIGNATIONS 

This notification lists only specific designations made under Rule 4.9(a) in the request. It is important to check that these 
designations are correct. Errors in designations can be corrected where precautionary designations have been made under 
Rule 4.9(b). The applicant is hereby reminded that any precautionary designations may be confirmed accord.n g to Ru e 4.9(c) 
before the expiration of 15 months from the priority date. If it is not confirmed, it will automatically be regarded -> withdrawn 
by the applicant. There will be no reminder and no invitation. Confirmation of a designation consists of the filing of a not,ce 
specifying the designated State concerned (with an indication of the kind of protect on or treatment desired) and the payment 
of the designation and confirmation fees. Confirmation must reach the receiving Office within the 15-month time limit. 

REQUIREMENTS REGARDING PRIORITY DOCUMENTS 

For applicants who have not yet complied with the requirements regarding priority documents, the following is recalled. 

Where the priority of an earlier national, regional or international application is claimed, the applicant must submit a copy 
of the said earlier application, certified by the authority with which it was filed ("the priority document ) to the receiving I Office 
(which will transmit it to the International Bureau) or directly to the International Bureau, before the expiration of 1 6 months rom 
he priority date, provided that any such priority document may still be submitted to the International Bureau before that date of 
international publication of the international application, in which case that document will be considered to have been received 
by the International Bureau on the last day of the 16-month time limit (Rule 17.1(a)). 

Where the priority document is issued by the receiving Office, the applicant may, instead of submitting the priority 
document, request the receiving Office to prepare and transmit the priority document to the Internationa Bureau. Such request 
must be made before the expiration of the 16-month time limit and may be subjected by the receiving Office to the payment 
of a fee (Rule 17.1(b)). 

If the priority document concerned is not submitted to the International Bureau or if the request to the receiving^ Office 
to prepare and transmit the priority document has not been made (and the corresponding fee. if any, paid) w.thin the applicable 
time limit indicated under the preceding paragraphs, any designated State may disregard the priority claim provided \ xba no 
designated Office may disregard the priority claim concerned before giving the applicant an opportun.ty to furnish the priority 
document within a time limit which is reasonable under the circumstances. 

Where several priorities are claimed, the priority date to be considered for the purposes of computing the 1 6-month time 
limit is the filing date of the earliest application whose priority is claimed. 
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From the INTERNATIONAL BUREAU 



PCT 
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SUBMISSION OR TRANSMITTAL 
OF PRIORITY DOCUMENT 

(PCT Administrative Instructions, Section 411) 


To: 

KOIKE, Akira 

No.11 Mori Bldg., 6-4, Toranomon 2- 
chome 

Mmato-ku, Tokyo 105-0001 
JAPON 


Date of mailing (day/month/year) 
03 April 2001 (03.04.01) 


Applicant's or agent's file reference 
SK01PCT25 


IMPORTANT NOTIFICATION 


International application No. 
PCT/JP01/01982 


International filing date (day/month/year) 
13 March 2001 (13.03.01) 


International publication date (day/month/year) 
Not yet published 


Priority date (day/month/year) 

13 March 2000 (13.03.00) 


Applicant 

SONY CORPORATION et al 





1 The applicant is hereby notified of the date of receipt (except where the letters M NR" appear in the right-hand column) b 
International Bureau of the priority document(s) relating to the earlier application(s) indicated below. Unless otherwise 
indicated by an asterisk appearing next to a date of receipt, or by the letters "NR", in the right-hand column, the priority 
document concerned was submitted or transmitted to the International Bureau in compliance with Rule 17.1(a) or (b). 



2. This updates and replaces any previously issued notification concerning submission or transmittal of priority documents. 

3 An asterisk^) appearing next to a date of receipt in the right-hand column, denotes a priority document submitted 
or transmitted to the International Bureau but not in compliance with Rule 17.1(a) or (b). In such a case, the attention 
of the applicant is directed to Rule 17.1(c) which provides that no designated Office may disregard the priority claim 
concerned before giving the applicant an opportunity, upon entry into the national phase, to furnish the priority document 
within a time limit which is reasonable under the circumstances. 

4. The letters "NR" appearing in the right-hand column denote a priority document which was not received by the International 
Bureau or which the applicant did not request the receiving Office to prepare and transmit to the International Bureau, 
as provided by Rule 17.1(a) or (b), respectively. In such a case, the attention of the applicant is directed to Rule 17.1(c) which 
provides that no designated Office may disregard the priority claim concerned before giving the applicant an opportunity, 
upon entry into the national phase, to furnish the priority document within a time limit which is reasonable under the 
circumstances. 



Priority date 



13 Marc 2000(13.03.00) 
16 May 2000(16.05.00) 



Priority application No. 

2000-068720 
60/204,729 



Country or regional Office 
or PCT re ceiving Office 

JP 

US 



Date of receipt 
of priority document 

26 Marc 2001 (26.03.01) 
26 Marc 2001 (26.03.01) 



The International Bureau of WIPO 


Authorized officer J ^szr* 




34, chemin des Colombettes 


Y. KUWA^A^^^ 


1211 Geneva 20, Switzerland 




Facsimile No. (41-22) 740.14.35 


Telephone No. (41-22) 338.83.38 
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From the INTERNATIONAL BUREAU 



PCT 

NOTICE INFORMING THE APPLICANT OF THE 
COMMUNICATION OF THE INTERNATIONAL 
APPLICATION TO THE DESIGNATED OFFICES 

(PCT Rule 47.1(c), first sentence) 


To: 

KOIKE, Akira 

No. 11 Mori Bldg., 6-4, Toranomon 2- 
chome 

Minato-ku, Tokyo 105-0001 

JArUIN 


Date of mailing (day/month/year) 

20 September 2001 (20.09.01) 


Applicant's or agent's file reference 
SK01PCT25 


IMPORTANT NOTICE 


International application No. International filing date (day/month/year) Priority date (day/month/year) 
PCT/JP01/01982 13 March 2001 (13.03.01) 13 March 2000 (13.03.00) 


Applicant 

SONY CORPORATION et al 



1. Notice is hereby given that the International Bureau has communicated as provided in Article 20/ the international application 
to the following designated Offices on the date indicated above as the date of mailing of this Notice. 

KR,US 



In accordance with Rule 47 1 (c), third sentence, those Offices will accept the present Notice as conclusive evidence that 
^^^^^^ international application has duly taken place on the i date of maHmg indicated above and no copy 
of the international application is required to be furnished by the applicant to the designated Office(s). 

2. The following designated Offices have waived the requirement for such a communication at this time: 

AU,CA,CN,EP,JP 



The communication will be made to those Offices only upon their request Furthermore, those Offices do not require the 
applicant to furnish a copy of the international application (Rule 49.1 (a-bis)). 

3. Enclosed with this Notice is a copy of the international application as published by the International Bureau on 
20 September 2001 (20.09.01 ) under No. WO 01/69936 

REMINDER REGARDING CHAPTER II (Article 31(2)(a) and Rule 54.2) 

If the applicant wishes to postpone entry into the national phase until 30 months (or later in some Offices) from the priority 
date ^demand Iter imernational preliminary examination must be filed with the competent International Preliminary 
Examining Authority before the expiration of 19 months from the priority date. 
It is the applicant's sole responsibility to monitor the 1 9-month time limit 

Note that only an applicant who is a national or resident of a PCT Contracting State which is bound by Chapter II has the 
right to file a demand for international preliminary examination. 

REMINDER REGARDING ENTRY INTO THE NATIONAL PHASE {Article 22 or 39(1 )) 

If the applicant wishes to proceed with the international application in the national he must, within^ 

or 30 months, or later in some Offices, perform the acts referred to therein before each designated or elected Office. 

For further important information on the time limits and acts to be performed for entering ^l^^^' Me the 
Annex to Form PCT/1B/301 (Notification of Receipt of Record Copy) and Volume II of the PCT Applicant s Guide. 
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DESCRIPTION 

Method and Apparatus for Generating Compact Transcoding Hints Metadata 
Technical Field 

The present invention relates to an audio/video (or audiovisual, "A/V") signal 
processing method and an A/V signal processing apparatus for extracting a compact 
representation of a multimedia description and transcoding hints metadata for 
transcoding between different (e.g., MPEG) compressed content representations, 
manipulating (e.g., MPEG compressed) bitstream parameters such as frame rate, bit 
rate, session size, quantization parameters, and picture coding type structure, such as 
group of pictures, or "GOP", classifying A/V content, and retrieving multimedia 
information. 

Background Art 

A/V content is increasingly being transmitted over optical, wireless, and wired 
networks. Since these networks are characterized by different network bandwidth 
constraints, there is a need to represent A/V content by different bit rates resulting in 
varying subjective visual quality. Additional requirements on the compressed 
representation of A/V content are imposed by the screen size, computational 
capabilities, and memory constraints of an A/V terminal. 

Therefore, A/V content stored in a compressed format, e.g., as defined by 
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Moving Pictures Experts Group ("MPEG"), must be converted to, e.g., different bit 
rates, frame rates, screen sizes, and in accordance with varying decoding complexities 
and memory constraints of different A/V terminals. 

To avoid the need for storing multiple compressed representations of the same 
A/V content for different network bandwidths and different A/V terminals, A/V 
content stored in a compressed MPEG format may be transcoded to a different MPEG 
format. 

With respect to video transcoding, reference is made to the following: 
W009838800A1: 0. H. Werner, N. D. Wells, M. J. Knee: Digital Compression 

Encoding with improved quantization, 1999, proposes an adaptive quantization 

scheme; 

US5870146: Zhu; Qin-Fan: Device and method for digital video transcoding, 

1999; 

W009929113A1: Nilsson, Michael, Erling; Ghanbari, Mohammed: Transcoding, 

1999; 

US5805224: Keesman; Gerrit J, Van Otterloo; Petrus J.: Method and Device for 
Transcoding Video Signal, 1998; 

W009943162ALGolin, Stuart, Jay: Motion vector extrapolation for transcoding 
video sequences, 1999; 

US5838664: Polomski; Mark D.: Video teleconferencing system with digital 
transcoding, 1998; 
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W009957673A2: Balliol, Nicolas: Transcoding of a data stream, 1999; 

US5808570: Bakhmutsky; Michael: Device and Method for pair-matching 
Huffman-Transcoding and high performance variable length decoder with two-word 
bitstream segmentation which utilizes the same, 1998; 

W009905870A2: Lemaguet, Yann: Method of Switching between Video 
Sequences and corresponding Device, 1999; and 

W009923560A1: LUDWIG, Lester; BROWN, William; YUL, Inn, J.; VUONG, 
Anh, T, VANDERLIPPE, Richard; BURNETT, Gerald; LAUWERS, Chris; LUI, 
Richard; APPLEBAUM, Daniel: Scalable networked multimedia system and 
application, 1999. 

However, none of these patents on video transcoding disclose or suggest using 
transcoding hints metadata information to facilitate A/V transcoding. 

The Society of Motion Picture and Television ("SMPTE") proposed a standard 
for Television on MPEG-2 Video Recoding Data Set (327M-2000), which provides for 
re-encoding metadata using 256 bits for every macroblock of the source format. 
However, this extraction and representation of transcoding hints metadata has several 
disadvantages. For example, according to the proposed standard, transcoding hints 
metadata (such as GOP structure, quantizer settings, motion vectors, etc.) is extracted 
for every single frame and macroblock of the A/V source content. This method offers 
the advantage of offering detailed and content adaptive transcoding hints and facilitates 
transcoding while widely preserving the subjective A/V duality. However, the size of 
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the transcoding hints metadata is very large. In one specific implementation of the 
proposed standard, 256 bits of transcoding hints metadata are stored per macroblock 
of MPEG video. This large amount of transcoding hints metadata is not feasible for, 
say, broadcast distribution to a local (e.g., home) A/V content server. Consequently, 
the proposed standard on transcoding hints metadata is limited to broadcast studio 
applications. 

Another technique for transcoding hints metadata extraction and representation 
includes collecting general transcoding hints metadata for the transcoding of 
compressed A/V source content with a specific bit rate to another compressed format 
and bit rate. However, this technique is disadvantageous in not taking the 
characteristic properties of the transcoded content into account. For example, in the 
source content, the A/V characteristics may change from an A/V segment with limited 
amount of motion and few details (e.g., a news anchor scene) to another A/V segment 
depicting fast motion and numerous details (e.g., a sports event scene). According to 
this technique, misleading transcoding hints metadata, which would not suitably 
represent the different characteristics of both video segments, would be selected and, 
therefore, result in poor A/V quality and faulty bit rate allocation. 



Disclosure of the Invention 
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In view of the foregoing, it is an object of the present invention to provide a 
method and apparatus for extracting a compact and A/V-content adaptive multimedia 
description and transcoding hints metadata representation. 

It is another object of the invention to provide a transcoding method and 
apparatus that allow for real-time execution without significant delay and inhibitive 
computat ional complexity one of the requirements for a transcoding method. A second 
requirement for a transcoding method is to preserve the subjective A/V quality as much 
as possible. To facilitate a transcoding method that fulfills both of these requirements 
for various compressed target formats, transcoding hints metadata may be generated 
in advance and stored separately or together with the compressed A/V content. It is a 
further object of this invention to provide a highly compact representation to reduce 
storage size and to facilitate distribution (e.g., broadcast to local A/V content server) 
of multimedia description and transcoding hints metadata. 

It is, thus, an object of the invention to provide a transcoding system that: 1) 
preserves the A/V quality through the transcoding process, and 2) limits the 
computational complexity in order to enable real-time applications with minimal delay. 
In accordance with an embodiment of the invention, additional data (metadata) 
covering transcoding hints may be associated to the compressed A/V content. 

Other objects and advantages of the invention will in part be obvious and will 
in part be apparent from the specification and the drawings. 

The present invention is directed to an apparatus and method that provides automatic 
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transcoding hints metadata extraction and compact representation. 

The present invention is in the field of transcoding compressed A/V content 
from one compressed format into A/V content of another format by using supporting 
transcoding metadata. The term transcoding includes, but is not limited to changing 
the compressed format (e.g. conversion from MPEG -2 format to MPEG-4 format), 
frame-rate conversion, bit rate-conversion, session-size conversion, screen-size 
conversion, picture coding type conversions, etc. 

The present invention may also be applied to automatic video classification 
using the aforementioned transcoding hints states as classes of different scene activity 
in video. 

The invention accordingly comprises the several steps and the relation of one 
or more of such steps with respect to each of the others, and the apparatus embodying 
features of construction, combination(s) of elements and arrangement of parts that are 
adapted to effect such steps, all as exemplified in the following detailed disclosure, and 
the scope of the invention will be indicated in the claims. 

Brief Description of the Drawings 

For a more complete understanding of the invention, reference is made to the 

following description and accompanying drawing(s), in which: 

Fig. 1 depicts a system overview of a transcoding system in a home network 
with various A/V terminals in accordance with an embodiment of the invention; 

Fig. 2 illustrates the transcoding hints extraction (Group of Pictures, "GOP") in 
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accordance with an embodiment of the invention; 

Fig. 3 illustrates an example for the selection of transcoding states depending 
on the number of new feature points per frame according to an embodiment of the 
invention; 

Fig. 4 shows an example of a transcoding hints state diagram with 3 states 
according to an embodiment of the invention; 

Fig. 5 illustrates the transcoding hints metadata extraction from compressed and 
uncompressed source content in accordance with an embodiment of the invention; 

Fig. 6 shows a video segmentation and transcoding hints state selection process 
in accordance with an embodiment of the invention; 

Fig. 7 shows a method of determining the boundaries of a new video segment 
(or new GOP) in accordance with an embodiment of the invention; 

Fig. 8 shows an algorithm on how to select the transcoding hints state in 
accordance with an embodiment of the invention; 

Fig. 9 provides an overview of a structural organization of transcoding hints 
metadata in accordance with an embodiment of the invention; 

Fig. 10 depicts a structural organization of a general transcoding hints metadata 
description scheme according to an embodiment of the invention; 

Fig. 11 depicts the transcoding hints metadata for source format definition 
according to an embodiment of the invention; 

Fig. 12 depicts the. transcoding hints metadata for target format definition 
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according to an embodiment of the invention; 

Fig. 13 depicts the general transcoding hints metadata representation according 
to an embodiment of the invention; 

Fig. 14 depicts the segment-based transcoding hints metadata representation 
according to an embodiment of the invention; 

Fig. 15 depicts the encoding complexity transcoding hints metadata according 
to an embodiment of the invention; and 

Fig. 16 depicts the transcoding hints state metadata according to an embodiment 
of the invention. 

Best Mode for Carrying out the Invention 

Fig. 1 depicts a general overview on a system 100 for transcoding in a home 
network environment in accordance with an embodiment of the invention. As shown 
in Fig. 1, an A/V content server 102 includes an A/V content storage 103, an A/V 
transcoding unit 106, a transcoding hints metadata extraction unit 104, and an A/V 
transcoding hints metadata storage buffer 105. A/V content storage 103 stores 
compressed A/V materia] from various sources with varying bit rate and varying 
subjective quality. For example, A/V content storage 103 may contain home video 
from a portable Digital Video ("DV") video camera 111, MPEG-4 compressed video 
with a very low bit rates (of say 10 kbit/s) from an MPEG-4 Internet camera 112, and 
MPEG-2 Main Profile at Main Level ("MP@ML") compressed broadcast video of 
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around 5 Mbit/s from a broadcast service 101, which is in some cases already 
associated with transcoding hints metadata. A/V content server 102 may also contain 
high definition compressed MPEG video at considerably higher bit rates. 

As shown in Fig. 1, A/V content server 102 is connected to a network 113, 
which may be a wire -based or wireless home network. Several A/V terminals with 
different characteristics may also be attached to network 113, including, but not limited 
to: a wireless MPEG-4 A/V personal digital assistant ("PDA") 107, a high resolution 
A/V terminal for high definition television entertainment 108, an A/V game console 
109, and an International Telecommunications Union Technical Standards Group 
("ITU-T") based videophone 110. The A/V terminals 107, 108, 109, and 110 may be 
attached with different bit rate transmission capabilities (due to cable or radio link) to 
home network 113. 

Furthermore, wireless video PDA 107, for example, may be limited in terms of 
computational power, storage memory, screen size, video frame rate, and network bit 
rate. Therefore, A/V transcoding unit 106 may transcode, for example, 5 Mbit/s 
MPEG-2 broadcast video at European 25 frames per second ("fps") and 720x 480 pel 
contained in A/V content server 102 to an MPEG-4 500 kbit/s 15 fps video for wireless 
transmission and display on a 352x240 pel display by wireless MPEG-4 video PDA 
107. A/V transcoding unit 106 may use the transcoding hints metadata from buffer 105 
to transcode, in real time, the compressed source bit rate of the A/V content to the 
capabilities of each specific target A/V terminal 107, 108, 109, and 110. The 
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transcoding hints metadata are generated in transcoding hints metadata extraction unit 

104 or they may be distributed by a broadcast service 101. 

As shown in Fig. 1, a compressed bitstream in a source format (hereinafter "first 
bitstream") 116 is transferred from A/V content buffer 103 to A/V transcoding unit 
106. A bitstream in a target format (hereinafter "second bitstream") 115 is transferred 
after transcoding in transcoding unit 106 to home network 113. From home network 
113, content in, e.g., compressed DV format is stored in A/V content storage 103 via 
link 114. 

Fig. 2 illustrates the transcoding hints extraction, transcoding hints storage, and 
transcoding process in accordance with an embodiment of the invention. As shown in 
Fig. 2, a buffer 201 contains A/V content in a source format. A buffer 202 contains a 
description of the source format, such as bit rate, compression method, GOP structure, 
screen size, interlaced or progressive format, etc. A buffer 203 contains a description 
of a target format, such as bit rate, compression method, GOP structure, screen size, 
interlaced or progressive format, etc. A transcoding hints extraction unit 207 reads the 
A/V content in compressed source format from A/V buffer 201 , as well as the source 
format description from buffer 202 and the transcoding target format description from 
buffer 203. After the transcoding hints are calculated by transcoding hints extraction 
unit 207, the transcoding hints are stored in a transcoding hints metadata buffer 206. 
An A/V transcoding unit 205 reads first bitstream 204 in the source format from A/V 
content buffer 201 and transforms the source format into the target format by means 
of the transcoding hints metadata stored in buffer 206. A/V transcoding unit 205 
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outputs second bitstream 208 in the new compressed target format to an A/V target 
format buffer 209 for storage. 

Figs. 3 and 4 illustrate the principle of transcoding hints metadata organization 
in accordance with an embodiment of the invention. MPEG-based video compression 
uses a predictable method, where changes between successive frames are encoded. 
Video content with a large number of changes from one frame to the next frame 
requires (for maintaining the subjective quality while limiting the bit rate) different re- 
encoding parameter settings, than video content with small changes between frames. 
Therefore, it is important to decide in advance on the re-encoding parameters. The 
transcoding hints metadata selection mainly depends on amount and characteristics of 
unpredictable visual content. The new visual content may not be predicted from 
previous frames and may be bit rate intensive encoded using DCT-coefficients. As 
such, the inventive method uses the number of new feature points, which are not 
tracked from a previous frame to a current frame to determine the amount of new 
content per frame. 

Fig. 3 depicts a graph of the number of new feature points per frame depending 
on the frame number of a video (horizontal axis, time axis). Section 301 is a part of 
a video segment where only a very small amount of new content appears between 
succeeding frames, and therefore respective transcoding hints metadata (e.g., large 
GOP size, low frame rate, low bit rate, ...) may be chosen. Section 302 includes a 
slightly higher number of new feature points per frame, which means that a state 
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describing transcoding hints metadata is chosen, which provides optimum transcoding 
parameters for this situation (e.g., slightly smaller GOP size, higher bit rate). Section 
303 depicts a transcoding metadata hints state with a high number of new feature points 
per frame, and therefore a high amount of new content per scene. As such, a smaller 
M value (I/P -frame distance) and a higher bit rate are chosen. 

Fig. 4 depicts an example of the basic organization of a transcoding hints 
metadata state diagram consisting of three discrete transcoding hints metadata states. 
Every discrete transcoding state may contain metadata for GOP structure, quantizer 
parameters, bit rate, screen size, etc. These transcoding hint parameters may have a 
fixed value or may be a function of another parameter. For example, the GOP length 
may be a discrete function of the number of new feature points per frame and the 
quantizer parameters may be a function of the edge and texture activity derived from 
the DCT coefficients. Each of the three transcoding hints metadata states in this 
example may be selected to accommodate three different encoding situations. As 
shown in Fig. 4, state "3" 403 is selected for a high amount of motion and low amount 
of new content per frame and represents the optimum state for transcoding hints 
metadata for such content. State "2" 402 is selected for low amount of motion and 
high amount of content with high edge activity, which may require a high number of 
bits to be spent. State "1" 401 is, for example, selected to accommodate the 
transcoding process for A/V content with low scene activity. There are also other 
special transcoding hint metadata states provided for video editing effects, like 
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different crossfading effects, abrupt scene changes, or black pictures between two 
scenes. The location of the video editing effects may be detected manually, 
semi-automatically, or fully automatically. 

Fig. 5 illustrates the transcoding hints metadata extraction from compressed and 
uncompressed source content in accordance with an embodiment of the invention. As 
shown in Fig. 5, a system 500 includes an A/V source content buffer 501, a source 
format description buffer 502, and a target format description buffer 503. 

A memory 504 is included for storing the motion vector, DCT-coefficient, and 
feature point extraction from compressed or uncompressed domains. In the 
compressed domain, motion vector from P- and B-macroblocks may be directly 
extracted from a bitstream. However, there are no motion vectors, for 
Intra-macroblocks. Therefore, the motion vectors obtained for B- and P- macroblocks 
may be interpolated for I-macroblocks (see Roy Wang, Thomas Huang: "Fast Camera 
motion Analysis in MPEG domain", IEEE International Conference on Image 
Processing, ICIP 99, Kobe, Japan, Oct 1999). DCT coefficients for blocks of 
Intra-macroblocks may be directly extracted from a bitstream. For P- and 
B-macroblocks, a limited number of DCT-coefficient s (DC and 2 AC coefficients) may 
be obtained by the method described by Shih-Fu Chang, David G. Messerschmid: 
"Manipulation and Composition of MC-DCT compressed video", IEEE Journal on 
Selected Areas in Communications, vol. 8, 1996. Exemplary methods of compressed 
domain feature point extraction and motion estimation is disclosed in the patent by 
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Peter Kuhn: "Method and Apparatus for compressed domain feature point registration 
and motion estimation", PCT patent, December 1999, which is incorporated herein by 
reference. In some cases, the A/V source content may only be available in 
uncompressed format or in a compression format that is not based on the DCT and 
motion compensation principle, which is employed by MPEG- 1, MPEG-2, MPEG-4, 
ITU-T H.261, and ITU-T H.263. For the DV format, it may be the case that only the 
DCT-coefficients are available. In these cases motion vectors may be obtained by 
motion estimation methods, cf. e.g. Peter Kuhn. "Algorithms, Complexity Analysis 
and VLSI Architectures for MPEG-4 Motion Estimation", Kluwer Academic 
Publishers, 1999. DCT-coefficients may be obtained by performing a block-based 
DCT-transform, cf. K.R. Rao, P. Yip: "Discrete Cosine Transform - Algorithms, 
Advantages, Applications", Academic Press 1990. Feature points in pel domain 
(uncompressed domain) may be obtained for example by the method described by 
Bruce D. Lucas, Takeo Kanade: "An iterative registration technique with an 
application to stereo vision", International Joint Conference on Artificial Intelligence, 
pp. 674-679, 1981. 

A motion analysis part 505 extracts the parameters of a parametric motion 
model from the motion vector representation in memory 504. Parametric motion 
models may have 6 and 8 parameters and parametric motion estimation may be 
obtained by methods described in M. Tekalp: "Digital Video Processing", Prentice 
Hall, 1995. The goal of using a motion representation is to eliminate the motion 
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estimation in the transcoder for delay and speed reasons. Therefore, the input 
representation of motion from the source bitstream may be used to derive the output 
representation (target bitstream). For example, screen-size resizing, 
interlaced-progressive conversion, etc., may rely heavily on the motion representation. 
The parameters of the motion representation may also be used for coding decisions on 
GOP structure. A texture/edge analysis part 506 may be based on the 
DCT-coefficients extracted from the bitstream, e.g., K.R. Rao, P Yip: "Discrete Cosine 
Transform - Algorithms, Advantages, Applications", Academic Press 1990, or K.W. 
Chun, K.W. Lim, H. D. Cho, J.B. Ra: "An adaptive perceptual quantization 
algorithm for video encoding, IEEE Transactions on Consumer Electronics, Vol. 39, 
No. 3, August 1993. 

A feature point tracking part 507 for the compressed domain may employ a 
technique described in Peter Kuhn. "Method and Apparatus for compressed domain 
feature point registration and motion estimation", PCT patent, December 1999, which 
is incorporated herein by reference. A processor 510 calculates the number of new 
feature points per frame. A processor 509 calculates the temporal video segmentation, 
and a processor 510 calculates the transcoding hints state for every segment. Methods 
for these calculations according to an embodiment of the invention will be described 
in detail below with reference to Fig. 6, Fig. 7, and Fig. 8. 

A memory 511 contains the motion-related transcoding hints metadata. A 
memory 512 contains the texture/edge related transcoding hints metadata, and a 
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memory 513 contains the feature point transcoding hints metadata, all of which will be 
described in detail below with reference to Fig. 15. A memory 514 contains video 
segment transcoding hints selection metadata, which will be described with reference 
to Fig. 16. The automatic extraction, compact representation, and usage of the 
transcoding hints metadata will now be described. 

Fig. 6 discloses a video segmentation and transcoding hints state selection 
process in accordance with an embodiment of the invention. At step 601, some 
variables are initialized. The variable "frame" is the current frame number of the 
source bitstream, and "nframes" is the number of frames within the new video segment 
(or GOP, group of pictures). The other variables are only of use within this routine. 
At step 602, the number of frames within the GOP is incremented. At step 603, it is 
determined whether a new segment/GOP starts within the frame, details of which will 
be discussed in detail with reference to Fig. 7. If so ("y es ")> control is passed to step 
604, otherwise, it is passed to step 615. At step 604, the variable "last_gop_start" is 
initialized with the value of "new gop start At steps 608 and 609, the variable 
"last_gop_stop" is set to "frame-1" if the variable "frame" is larger than 1. Otherwise, 
at step 610, "last_gop_stop" is set to 1. Next, at step 611, which is depicted in detail 
in Fig, 8, determines the transcoding hints state based on motion parameters 605, 
texture/edge parameters 606, and feature-point data 607. At step 612, the transcoding 
hints metadata are output to the transcoding hints metadata buffers. In accordance with 
an embodiment of the invention, the transcoding hints metadata comprises "nframes" 
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(number of frames within the GOP), the transcoding hints state with all the parameters, 
and the start frame number of the new GOP ("new_gop_start"). After that, the variable 
"nframes" is set to 0 and the current frame number "frame" is given to the variable 
"new_gop_start". Then, at step 615, it is tested to determine if all frames of the source 
bitstream have been processed. If not ("no"), control is passed to step 614 where the 
frame number is incremented and the process is repeated starting from step 602. 
Otherwise, the process is terminated. 

Fig. 7 illustrates a method for determining the start frame and the end frame of 
a new video segment or GOP according to an embodiment of the invention. At step 
701, it is determined whether the variable "nframes" from Fig. 6 is an integer multiple 
of M (which is the I/P frame distance). If so, then "no" is selected and at step 702, it 
is determined whether the current frame number is the first frame. If so ("no"), control 
is passed to step 703 where it is determined whether "nframes" is greater than a 
minimum number of frames "gop min" within a GOP. In case the result at step 702 
is "yes", a new GOP is started at step 705. In case the result at step 703 is "yes", a new 
GOP is started at step 705. In case the result at step 703 is "no", control is passed to 
step 704 where it is determined whether "nframes" is greater than a maximum number 
of frames "gop max" within a GOP. In case the result at step 704 is "yes", the GOP 
is closed at step 706, otherwise, the process is terminated. 

Fig. 8 illustrates a process for selecting a transcoding hint state for a specific 
GOP or A/V segment taking only the number of new feature points per frame into 
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account in accordance with an embodiment of the invention. Based on the basic idea 
illustrated, similar decision structures may be implemented using the aforementioned 
motion parameters from a parametric motion estimation as well as texture/edge 
parameters gained from DCT-coefficients. It is noted that the class or algorithms 
described may also be used to classify A/V material in terms of motion, edge activity, 
new content per frame, etc., leading to a higher level of A/V classification. In such 
cases, the transcoding hint states would represent specific classes of different content 
material. Referring now to Fig. 8, at step 801, variables "frame_no", "last_gop_start", 
"sum" and "newseg" are initialized. The variable "frame no" is given the contents of 
the "last gop start" parameter, and the variables "sum" and "new_seg" are initialized 
with zero. Then, at step 802, the contents of the variable "sum" is incremented by the 
number of new feature points of the current frame ("frame_no"). At step 803, it is 
determined whether the variable "frame_no" is less than the variable "last_j*op_stop". 
If so ("yes"), step 802 is repeated, otherwise, control is passed to step 804. At step 
804, it is determined whether the value of the variable "sum" is less than one-eight of 
a predetermined parameter "summax". The parameter "summax" is a constant that 
represents the maximum number of feature points that can be tracked from frame to 
frame multiplied by the number of frames between the frames "last gop start" and 
"last _gop_stop". It may have the value 200 according to an embodiment of the 
invention. If the result at step 804 is "yes", the transcoding hints state 1 is selected at 
step 806 for which the parameters are shown in Table 1 of Fig. 8. Otherwise, at step 
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805, it is determined whether the value of the variable "sum" is less than one-quarter 

of the predetermined parameter "summax". If so ("yes"), the transcoding hints state 

2, as shown in Table 1 is selected at step 807. If not ("no"), the transcoding hints state 

3 (as shown in Table 1) is selected at step 808 and the process is terminated. It is noted 

that the decision thresholds in steps 804 and 805 depend on the definition and number 

of transcoding hints states. 

Transcoding Hints Metadata Description 

For metadata explanation, a pseudo C-code style may be used. Abbreviations 

D for Description and DS for Description Schemes, as defined in the emerging 

MPEG-7 metadata standard, may be used. 

Fig. 9 depicts a structural organization of transcoding hints metadata within a 

Generic A/V DS 901 in accordance with an embodiment of the invention. As shown 

in Fig. 9, Segment DS 904 and Media Info DS 902 are derived from Generic A/V DS 

901. Segment Decomposition 906 is derived from Segment DS 904, and Video 

Segment DS 907 and Moving Region DS 907 are derived from Segment 

Decomposition 906. Segment-based transcoding hints DS 909, which will be 

described in detail with reference to Fig. 14, is derived from Video Segment DS 907. 

Video Segment DS 907 accesses one or several transcoding hint state DS 911, which 

will be described in detail with reference to Fig. 16. From Moving Region DS 908, the 

Segment-based transcoding hints DS 910, which will be described in detail with 

reference to Fig. 14, for moving regions is derived, which accesses one or several 

transcoding hint state DS 912, which will be described in detail with reference to Fig. 
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16. From Media Info DS 902, Media Profile DS 903 is derived. From Media Profile 
DS 903, General Transcoding Hints DS 905, which will be described with reference 
to Figure 10, is derived. 

Fig. 10 depicts the structural organization of Transcoding Hints DS 1001, which 
consists of one instance of the Source Format Definition DS 1002, which will be 
described with reference to Fig. 11, one or several instances of target format definition 
DS 1003 which will be described with reference to Fig. 12. Additionally, Transcoding 
Hints DS 1001 consists of one optional instance of General Transcoding Hints DS 
1004, which will be described with reference to Fig. 13, and one optional Transcoding 
Encoding Complexity DS 1005, which will be described with reference to Fig. 15. 

Fig. 11 depicts source format definition transcoding hints metadata (e.g., Source 
Format Definition DS 1002 in Fig. 10) which is associated to the whole A/V content 
or to a specific A/V segment, in accordance with an embodiment of the invention. As 
shown in Fig. 11, relevant Descriptors and Description Schemes may include: 

• bitrate is of type <int> and describes the bit rate per second of the source 

A/V data stream. 

• size_of_pictures is of type <2*int> and describes the size of picture of 
the source A/V format in x and y directions. 

• number_of_frames_per second is of type <int> and describes the 

number of frames per second of the source content. 

• pel_aspect_ratio is of type <float> and describes the pel aspect ratio. 
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• pel_colour_depth is of type <int> and describes the color depth. 

• usage_of jrogressive_interlaced_format is of size < 1 bit> and 
describes whether the source format is in progressive or in interlaced 
format. 

• usage_of_frame_field_pictures is of size <1 bit> and describes whether 
frame or field pictures are used. 

• compression method is of type <int> and defines the compression 
method used for the source format and may be selected from a list that 
includes: MPEG-1, MPEG-2, MPEG-4, DV, H.263, H,261, etc. For 
every compression method, further parameters may be defined here. 

• GOP_structure is a run-length-encoded data field of the I,P,B-states. 
For example, in case there are only I-frames in an MPEG-2 video, direct 
conversion to the DV format in compressed domain is possible. 

Fig. 12 depicts target format definition transcoding hints metadata, which may 
be associated to the whole A/V content or to a specific A/V segment, in accordance 
with an embodiment of the invention. As shown in Fig. 12, the relevant Descriptors 
and Description Schemes may include: 

• bitrate is of type <int> and describes the bit rate per second of the target 
A/V data stream. 

• size_of_pictures is of type <2*int> and describes the size of picture of 
the target A/V format in x and y directions. 
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• number_of_frames_per_second is of type <int> and describes the 
number of frames per second of the target content. 

• peI_aspect_ratio is of type <float> and describes the pel aspect ratio. 

• pel_colour_depth is of type <int> and describes the color depth. 

• usage_ofjrogressive_interiaced_format is of size <1 bit> and 
describes whether the target format needs to be progressive or interlaced. 

• usage_of_frame_field_pictures is of size <1 bit> and describes whether 
frame or field pictures are used. 

• compression_method is of type <int> and defines the compression 
method used for the target format and may be selected from a list that 
includes: MPEG-1, MPEG-2, MPEG-4, DV, H.263, H.261, etc. For 
every compression method, further parameters may be defined here. 

• GOP_structure is an optional run-length-encoded data field of the 
I,P,B-states. With this optional parameter, a fixed GOP structure may be 
forced. A Fixed GOP structure may be useful, for example, to force I- 
frames at certain locations to facilitate video editing. 

Fig. 13 depicts general transcoding hints metadata (e.g., General Transcoding 
Hints DS 1004 in Fig. 11), which may be associated to the whole A/V content or to a 
specific A/V segment, according to an embodiment of the invention. As shown in Fig. 
13, relevant Descriptors and Description Schemes may include: 

• use_region_of_interest_DS has a length of <1 bit> and indicates whether 
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a region of interest description scheme is available as transcoding hints. 
In case the region_of_interest_DS is used, then a shape_D (which may 
be for example one of the following: boundary_box_D, MB_shape_D, 
or any other shape_D) together with a motion_trajectory_D may be 
used to spatially and temporally describe the region of interest. An 
MB_shape_D may use macroblock (16x16) sized blocks for object shape 
description. Motion_trajectory_D already includes a notion of time so 
that the start frame and the end frame of the region_oMnterest_DS may 
be defined. The region_of_interest_DS may have the size of the 
respective shapeJD and the respective motion_trajectory_D. For 
transcoding applications, the region_of_interest_DS may be used, for 
example, to spend more bits (or modify the quantizer, respectively) for 
the blocks within the region of interest than for the background. Another 
transcoding application to MPEG-4 may be to describe the region of 
interest by a separate MPEG-4 object and to spent a higher bit rate and 
a higher frame rate for the region of interest than for other MPEG-4 
objects like the background. The extraction of the 
region_ofjnterest_DS may be performed automatically or manually. 
use_editing_effects_transcoding_hints_DS has a length of <1 bit> and 
indicates if information is available on editing-effects-based transcoding 
hints. 
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camera_flash is a list of entries where every entry describes the frame 
number where a camera flash occurs. Therefore, the length of the 
descriptor is the number of camera flash events multiplied by <int>. For 
transcoding applications, the camera_flash descriptor is very useful, as 
most of the video (re-) encoders /transcoders use a motion estimation 
method based on the luminance difference, c.f . Peter Kuhn: "Algorithms, 
Complexity Analysis and VLSI Architectures for MPEG-4 motion 
estimation", Kluwer Academic Publishers, 1999. In case of a luminance- 
based motion estimation, the mean absolute error between two 
macroblocks of two subsequent frames (one with flash, one without 
flash) would be too high for prediction and the frame with the camera 
flash would have to be encoded as Intra-frame with high bit rate costs. 
Therefore, indicating the camera flash within a transcoding hints 
Description Scheme ("DS"), allows for using, for example, a luminance 
corrected motion estimation method or other means to predict the frame 
with the camera flash from the anchor frame(s) with moderate bit costs, 
crossjfading is a list of entries where every entry describes the start 
frame and the end frame of a cross fading. Therefore, the length of this 
descriptor is two times <int> of the number of cross fading events. 
Indicating the cross fading events in transcoding hints metadata is very 
useful for controlling the bit rate/quantizer during the cross fading. 
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During cross fading, prediction is generally of limited use causing a bit 
rate increase for prediction error coding. As during cross fading, the 
scene is usually blurred, the bit rate increase may be limited by adjusting 
the quantizer scale, bit rate, or rate control parameters, respectively. 

• black__pictures is a list of entries where every entry describes the start 
frame and the end frame of a sequence of black pictures. Between 
scenes, especially in home video, black pictures may occur. 
Experimentally, results indicate that a series of black pictures increases 
the bit rate in motion-compensated DCT coders because the prediction 
is only of limited use. Therefore, this transcoding hints descriptor may 
be used to limit the bit rate during black pictures by adjusting the 
quantizer scale, bit rate, or rate control parameters, respectively. 

• fade_in is similar to cross fading, and is described as a number of 
entries determining the start frame and the end frame of a fade in. In 
comparison to cross fading, the fade in starts from black pictures, and, 
therefore, a kind of masking effect of the eye may be used to limit the bit 
rate during fade in by adjusting the quantizer_scale, bit rate, or rate 
control parameters, respectively. 

• fadeout is similar to fade_in, except that after a scene, a series of black 
pictures are described. 

• abrupt_change is described by a list of single frame numbers of type 
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<int> indicating where abrupt scene or shot changes without fading 
appear. These events are indicated, for example, by the very high and 
sharp peaks in Fig. 3. These peaks indicate the beginning of a new 
camera shot or scene. The abrupt_change editing effect is in contrast to 
the fading effects. When abrupt changes between two video segments 
appear, then the human visual perception needs a few milliseconds to 
adapt and recognize the details of the new A/V segment. This slowness 
effect of the human eye may be used beneficially for video transcoding, 
for example, for reducing the bit rate or modifying the quantizer scale 
parameters for the first frames of a video segment after an abrupt change 
of a scene or shot. 

use_motion_transcoding_hints_DS has a length of <1 bit> and 
indicates the use of motion-related transcoding hints metadata, 
number of regions indicates the number of regions for which the 
following motion-related transcoding hints metadata are valid. 
for_every_region is indicated by a field of < 1 bit> length, whether the 
region is rectangular or arbitrarily-shaped. In case the region is 
arbitrarily-shaped, a region descriptor (consisting, e.g., of a shape 
descriptor and a motion trajectory descriptor) is used. In case of a 
rectangular region, the size of the rectangular region is used. The motion 
field within this region is described by a parametric motion model, which 
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is determined by several parameters for every frame or sequence of 
frames. For transcoding, this motion representation of the real motion of 
the source video may be used to limit the search area of the 
computational complex motion estimation of the (re-)encoding part, and 
also for fast and efficient interlaced/de-interlaced (frame/field) 
conversion and determining the GOP (Group of Pictures) structure 
depending on the amount of motion within the video. The motion 
representation may also be used beneficially for size conversion of the 
video. 

Fig. 14 depicts the segment-based transcoding hints metadata (e.g., segment- 
based transcoding hints DS 909 and 910 in Fig. 9) which may be used to determine the 
(re-) encoder/transcoder settings for an A/V segment which depicts constant 
characteristics, in accordance with an embodiment of the invention. As shown in Fig. 
14, relevant Descriptors and Description Schemes may include: 

• start_frame is of type <int> and describes the frame number of the 
beginning of the transcoding hints metadata of an A/V segment. 

• nframes is of type <int> and describes the length of an A/V segment. 

• I_frame_location gives several possibilities for describing the location 
of I-frames within an A/V segment. 

• select_one_outof_the_following is of size <2 bit> and selects one of 
the following four I-frame location description methods. 
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• first frame is of size <1 bit> and is the default I-frame location. This 
method describes an A/V segment where only the first frame is an Intra 
frame of the A/V segment and is used as an anchor for further prediction 
and all other frames within the A/V segment are P- or B-frames. 

• List of frames gives a list of frame numbers of Intra-frames within an 
A/V segment. This method allows for arbitrarily describing the location 
of Intra-frames within an A/V segment. For k frames within this list, the 
size of this descriptor is < k*int >. 

• first_frame_and_every_k_frames is of type <int>, where the first 
frame within a segment is Intra and k describes the interval of I -frames 
within the A/V segment. 

• no_I_frame is of size < 1 bit> and describes the case where no I-frame 
is used within an A/V segment, which is useful when the encoding of the 
A/V segment is based on an anchor (Intra-frame) in a previous segment. 

• quantizer_scale is of type <int> and describes the initial quantizer scale 
value for an A/V segment. 

• target_bitrate is of type <int> and describes the target bit rate per 
second for an A/V segment. 

• target_min_bitrate is of size <int> and describes the minimum target bit 
rate per second for an A/V segment (optional). 

• target_max_bitrate is of size <int> and describes the maximum target 
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bit rate per second for an A/V segment (optional). 
use_transcoding_states is of size <1 bit> and describes whether 
transcoding hint states are used for an A/V segment. 
transcoding_state_nr is of type <int> and gives the transcoding hint 
metadata state for a segment. The transcoding hint metadata state is a 
pointer to an entry in a table of transcoding hint states. The table of 
transcoding hint states may have several entries, where new entries may 
be added or deleted by transcoding hints parameters. The transcoding 
hints metadata of a single transcoding hint state will be described with 
reference to Fig. 16. 

add_new_transcodin^_state is of size <1 bit> and describes whether a 
new transcoding state with associated information has to be added to the 
transcoding hints table. In case the add_new_transcoding^state signals 
"yes", a list of parameters of the new transcoding hints state is given. 
The size of the parameter list is determined by the number of parameters 
of one transcoding hints state and the number of transcoding hints state. 
remove_transcoding_state is a flag of size <1 bit> indicating whether 

a transcoding state may be removed or not. In case a transcoding state 

may be removed, the state number (type: <int>) of the transcoding state 

to be removed is given. 

use_encoding L _complexity_description is of size < 1 bit> and signals 
whether a more detailed encoding complexity description scheme as 
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defined in Fig. 15 has to be used. 
Fig. 15 depicts the coding complexity transcoding hints metadata, which may 
be associated to the whole A/V content or to a specific A/V segment, according to an 
embodiment of the invention. Encoding complexity metadata may be used for rate 
control and determines the quantizer and bit rate settings. 

• use_feature_points is of size <1 bit> and indicates the use of feature 
point based complexity estimation data. 

• seIect_feature_point_method is of size <2 bits> and selects the feature 
point method. 

• number_of_new_feature_points per frame describes a list of the 
number of new feature points per frame as indicated in Fig. 3, and 
which are of size <nframes * int>. This metric indicates the amount of 
new content per frame. 

• feature_point_metrics describes a list of metrics based on the new 
feature points per frame within one segment. The metrics are represented 
as an ordered list of <int> values with the following meaning: mean, 
max, min, variance, standard deviation of the number of the new feature 
points per frame. 

• use_equation_description is an <int> pointer to an equation-based 
description of the encoding complexity per frame. 

• use_motion_description is of size <1 bit> and indicates the use of a 
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motion-based complexity description. 

seIect_motion_m ethod is of size <4 bits> and selects the motion 
description method. 

param_k_motion is of size <nframes * k * int> and describes the k 
parameters for every single frame of a global parametric motion model. 
motion_metrics describes a list of metrics for the whole segment-based 
on the size of the motion vectors. The metrics are represented as an 
ordered list of <int> values with the following meaning: mean, max, min, 
var, stddev of the macroblock motion vectors. 

block_motion_field describes every vector of an m*m block sized 
motion field and is of size < nframes*int*size_x*size_y / (m*m)>. 
use _texture_edge_metrics is a flag that is set when texture or edge 
metrics are used and it is of size <1 bit>. 

select_texture_edge_metrics is of size <4 bits> and it determines which 
texture metric from the following is used. 

DCT_bIock_energy is the sum of all DCT-coefficients of one block and 
is defined for every block within a frame. It is of size 
<size_y * size-X* nfram es * int/64> . 

DCTJblock_activity is defined as the sum of all DCT-coefficients of 
one block but without the DC coefficient. It is defined for every block 
within a frame and is of size <size_y*size_x*nframes*int/64> 
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• DCT_energy_ metric describes a list of metrics for the whole segment- 
based on the individual DCT energies of each block. The metrics are 
represented as an ordered list of <int> values with the following 
meaning: mean, max, min, variance, standard deviation of all the 
individual DCT energy metrics. The size of the descriptor is <6*int>. 
An alternative implementation of this descriptor is to describe the DCT 
energy metric for every single frame of the video segment. 

• DCT_activity_metric describes a list of metrics for the whole segment- 
based on the individual DCT activities of each block. The metrics are 
represented as an ordered list of <int> values with the following 
meaning: mean, max, min, variance, standard deviation of all the 
individual DCT activity metrics. The size of the descriptor is <6*int>. 
An alternative implementation of this descriptor is to describe the DCT 
activity metric for every single frame of the video segment. 

Fig. 16 depicts the transcoding hints state metadata, which may be associated 
to the whole audio-visual content or to a specific A/V segment according to an 
embodiment of the invention. Relevant Descriptors and Description Schemes may 
include: 

• M is of type <int> and describes the I-frame/P-frame distance. 

• bitrate_fraction_forJ[ is of type <float> and describes the fraction of 
the bit rate defined for an A/V segment that is available for I frames. 
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• bitrate_fraction_for P is of type <float> and describes the fraction of 
the bit rate defined for an A/V segment that may be used for P frames. 
The bit rate fraction for B-frames is the rest of the percentage to 100 %. 

• quantizer_scale_ratio_I_P is of type <float> and denotes the relation 
of the quantizer scale (as defined for this segment) between I- and 
P-frames. 

• quantizer_scaIe_ratio_I_B is of type <float> and denotes the relation 
of the quantizer scale (as defined for this segment) between I- and 
B-frames. It is noted that either the bit rate descriptors 
(bitrate_fraction_for_I<bitrate_fraction_for_P), the quantizer_scale_ratio 
descriptors (quantizer_scale_ratio_I_P, quantizer_scale_ratio_I_B ) or 
the following rate-control parameters may be mandatory. 

• X_I, X_P, X _B are frame_vbv_complexities and are each of type <int> 
and are defined in case of frame based compression target format (cf., 
Fig. 12). These and the following Virtual Buffer Verifier ("VBV") 
complexity adjustments may be optional and may be used to modify the 
rate control scheme according to the source content characteristics and 
the target format definition. 

• X_I top, X_P top, X B top are field_vbv_complexities for the top field 
and are each of type <int> and are defined in case of field based 
compression target format (cf. Fig. 12). 
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• X_I_bot, X_P_bot, X_B_bot are field_vbv_complexities for the bottom 
field and are each of type <int> and are defined in case of field based 
compression target format (cf. Fig. 12). 

It will thus be seen that the objects set forth above, among those made apparent 
from the preceding description, are efficiently attained and, because certain changes 
may be made in carrying out the above method and in the construct ion(s) set forth 
without departing from the spirit and scope of the invention, it is intended that all 
matter contained in the above description and shown in the accompanying drawings 
shall be interpreted as illustrative and not in a limiting sense. 

It is also to be understood that the following claims are intended to cover all of 
the generic and specific features of the invention herein described and all statements 
of the scope of the invention which, as a matter of language, might be said to fall 
therein. 
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Claims 

1. A video/audio signal processing method for processing supplied video/audio 
signals, comprising the steps of: 

describing transcoding target bitstream parameters; 
extracting transcoding hints metadata; 
storing the transcoding hints metadata; 
separating A/V material into segments; 

associating the transcoding hints metadata to the separated A/V segments; and 
transcoding the A/V material. 

2. A video/audio signal processing method according to claim 1, wherein the step 
of describing the transcoding target bitstream parameters comprises the steps of: 

defining a bit rate of a second bitstream of compressed images; 
defining a size of pictures of the second bitstream of compressed images; 
defining a number of frames per second of the second bitstream of compressed 
images; 

defining an aspect ratio of a pel of the second bitstream of compressed images; 
defining a color depth of each of the pel of the second bitstream of compressed 
images; 

defining whether progressive format is used for the second bitstream of 
compressed images; 

defining whether interlaced format is used for the second bitstream of 
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compressed images; 

defining whether frame pictures are used for the second bitstream of compressed 
images; 

defining whether hold pictures are used for the second bitstream of compressed 
images; and 

defining a compression method of the second bitstream of compressed images. 

3. A video/audio signal processing method according to claim 2, wherein the step 
of describing the transcoding target bitstream parameters further comprises the step of 
defining employed compression standards as defined by MPEG (Moving Pictures 
Expert Group). 

4. A video/audio signal processing method according to claim 2, wherein the step 
of describing the transcoding target bitstream parameters further comprises the step of 
defining employed compression standards as defined by ITU-T (International 
Telecommunications Union Technical Standards Group). 

5. A video/audio signal processing method according to claim 1, wherein the step 

of extracting the transcoding hints metadata comprises the steps of: 

receiving a first bitstream of compressed image data having a first GOP 
structure; 

obtaining first motion information from the first bitstream; 
obtaining texture/edge information of a first segmentation; 
obtaining feature points and associated motion information from the first 
bitstream; and 
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obtaining region of interest information from the first bitstream. 

6. A video/audio signal processing method according to claim 5, wherein the step 
of extracting the transcoding hints metadata further comprises the step of storing the 
first motion information as transcoding hints metadata. 

7. A video/audio signal processing method according to claim 5, wherein the step 
of extracting the transcoding hints metadata further comprises the step of representing 
motion-related transcoding hints metadata as parameters of a parametric motion model. 

8. A video/audio signal processing method according to claim 7, wherein the step 
of extracting the transcoding hints metadata further comprises the step of employing 
the parametric motion model to describe a global motion within subsequent rectangular 
video frames. 

9. A video/audio signal processing method according to claim 7, wherein the step 
of extracting the transcoding hints metadata further comprises the step of employing 
the parametric motion model to describe a motion within a defined region of arbitrary 
shape. 

10. A video/audio signal processing method according to claim 9, wherein the 
parametric motion model is employed to describe the motion within the defined region 
of arbitrary shape as used within MPEG-4. 

11. A video/audio signal processing method according to claim 5, wherein the step 
of extracting the transcoding hints metadata further comprises the step of representing 
motion-related transcoding hints metadata as an array of motion vectors contained in 
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the first bitstream of the compressed image data. 

12. A video/audio signal processing method according to claim 5, wherein the step 
of extracting the transcoding hints metadata further comprises the step of representing 
motion-related transcoding hints metadata as an array of motion vectors derived from 
motion vectors contained in the first bitstream of the compressed image data. 

13. A video/audio signal processing method according to claim 5, wherein the step 
of extracting the transcoding hints metadata further comprises the step of representing 
motion-related transcoding hints metadata as a list of feature points with associated 
motion vectors, which are tracked within subsequent frames. 

14. A video/audio signal processing method according to claim 5, wherein the step 
of extracting the transcoding hints metadata further comprises the step of representing 
motion-related transcoding hints metadata as a list of feature points with associated 
motion vectors, which are tracked within arbitrarily shaped regions, within subsequent 
frames. 

15. A video/audio signal processing method according to claim 5, wherein the step 
of extracting the transcoding hints metadata further comprises the step of representing 
texture-related transcoding hints metadata as one of a list of DCT-coefficients and a 
measure (one of mean, minimum, maximum, variance, and standard deviation) derived 
thereof. 

16. A video/audio signal processing method according to claim 5, wherein the step 
of extracting the transcoding hints metadata further comprises the step of representing 
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edge-related transcoding hints metadata as one of a list of DCT-coefficients and a 
measure (one of mean, minimum, maximum, variance, and standard deviation) derived 
thereof. 

17. A video/audio signal processing method according to claim 5, wherein the step 
of extracting the transcoding hints metadata further comprises the step of representing 
the feature points and associated motion-related transcoding hints metadata as a list. 

18. A video/audio signal processing method according to claim 5, wherein the step 
of extracting the transcoding hints metadata further comprises the step of representing 
encoding-complexity-related transcoding hints metadata as a complexity metric derived 
from a life-time list of feature points tracked within subsequent frames by using a 
number of lost and new featurepoints from one frame to a next frame. 

19. A video/audio signal processing method according to claim 1, wherein the step 
of storing the transcoding hints metadata comprises the step of maintaining a buffer 
containing transcoding hints metadata for several situations. 

20. A video/audio signal processing method according to claim 19, wherein the step 
of storing the transcoding hints metadata further comprises the step of storing 
individual general transcoding hints metadata for several target devices. 

21 . A video/audio signal processing method according to claim 19, wherein the step 
of storing the transcoding hints metadata further comprises the step of storing general 
transcoding hints metadata for A/V segments of varying scene activity. 

22. A video/audio signal processing method according to claim 1, wherein the step 
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of separating the A/V material into segments comprises the steps of: 
using feature points with associated motion vectors; 
tracking the feature points and keeping a life-time of feature points; and 
determining a new A/V segment for transcoding based on a number of feature 

points that could not be tracked from one frame to a next frame. 

23. A video/audio signal processing method according to claim 1, wherein the step 
of associating the transcoding hints metadata to the separated A/V segments comprises 
the steps of: 

calculating a number of new feature points per frame; 

determining if the number of new feature points exceeds some thresholds; and 
selecting based on said determination one of several transcoding hints states. 

24. A video/audio signal processing method according to claim 1, wherein the step 
of transcoding the A/V material comprises the steps of: 

receiving a first bitstream of compressed image data having a first GOP 
structure; 

extracting transcoding hints metadata from the first bitstream; 
utilizing the transcoding hints metadata associated to the first bitstream to 
facilitate transcoding; and 

outputting a second bitstream. 

25. A video/audio signal processing method according to claim 24, wherein the step 
of transcoding the A/V material further comprises the step of utilizing the transcoding 
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hints metadata associated to temporal segments of the first bitstream to facilitate 
transcoding. 

26. A video/audio signal processing method according to claim 24, wherein the step 
of transcoding the A/V material further comprises the step of utilizing the transcoding 
hints metadata associated to spatial segments of the first bitstream to facilitate 
transcoding. 

27. A video/audio signal processing method according to claim 24, wherein the step 
of transcoding the A/V material further comprises the step of utilizing motion 
information contained in the transcoding hints metadata to extrapolate second motion 
information for the second bitstream of compressed image data having a second GOP 
structure different from the first GOP structure. 

28. A video/audio signal processing method according to claim 24, wherein the step 
of transcoding the A/V material further comprises the step of controlling a bit rate of 
the second bitstream so that a bit rate of the first bitstream is different from the bit rate 
of the second bitstream. 

29. A video/audio signal processing method according to claim 28, wherein the step 
of transcoding the A/V material further comprises the step of adjusting a size of 
pictures represented by the first bitstream so that pictures represented by the second 
bitstream exhibits a size different from the size of the pictures represented by the first 
bitstream. 

30. A video/audio signal processing method according to claim 24, wherein the step 



WO 01/69936 



PCT/JP01/01982 



42 

of transcoding the A/V material further comprises the step of adjusting a size of 
pictures represented by the first bitstream so that pictures represented by the second 
bitstream exhibit a size different from the size of the pictures represented by the first 
bitstream. 

31. A video/audio signal processing method according to claim 30, wherein the step 
of transcoding the A/V material further comprises the step of encoding the pictures 
represented by the second bitstream as field pictures when the pictures represented by 
the first bitstream are encoded as frame pictures. 

32. A video/audio signal processing method according to claim 30, wherein the step 
of transcoding the A/V materia] further comprises the step of encoding the pictures 
represented by the second bitstream as frame pictures when the pictures represented 
by the first bitstream are encoded as field pictures. 

33. A video/audio signal processing method according to claim 30, wherein the step 
of transcoding the A/V material further comprises the step of interlacing the pictures 
represented by the first bitstream when the pictures represented by the first bitstream 
are received as a progressive sequence so that the pictures represented by the second 
bitstream are output as an interlaced sequence. 

34. A video/audio signal processing method according to claim 30, wherein the step 
of transcoding the A/V material further comprises the step of de-interlacing the pictures 
represented by the first bitstream when the pictures represented by the first bitstream 
are received as an interlaced sequence so that pictures represented by the second 
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bitstream are output as a progressive sequence. 

35. A video/audio signal processing method according to claim 24, wherein the step 
of transcoding the A/V material further comprises the step of encoding pictures 
represented by the second bitstream as field pictures when pictures represented by. the 
first bitstream are encoded as frame pictures. 

36. A video/audio signal processing method according to claim 24, wherein the step 
of transcoding the A/V material further comprises the step of encoding pictures 
represented by the second bitstream as frame pictures when pictures represented by the 
first bitstream are encoded as field pictures 

37. A video/audio signal processing method according to claim 24, wherein the step 
of transcoding the A/V material further comprises the step of interlacing pictures 
represented by the first bitstream when pictures represented by the first bitstream are 
received as a progressive sequence so that pictures represented by the second bitstream 
are output as an interlaced sequence. 

38. A video/audio signal processing method according to claim 24, wherein the step 
of transcoding the A/V material further comprises the step of de-interlacing pictures 
represented by the first bitstream when pictures represented by the first bitstream are 
received as an interlaced sequence so that pictures represented by the second bitstream 
are output as a progressive sequence. 

39. A transcoding method, comprising the steps of: 

receiving a first bitstream of compressed image data representing pictures of a 
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first size; 

extracting first motion-related transcoding hints metadata from the first 
bitstream; 

storing the first motion-related transcoding hints metadata; 

utilizingthe stored first motion-related transcoding hints metadata to extrapolate 
second motion information for a second bitstream of compressed image data 
representing pictures of a second size different from the first size; and 

outputting the second bitstream. 

40. A transcoding method, comprising the steps of: 

receiving a first bitstream of compressed image data representing pictures 
defining an interlaced sequence; 

extracting first motion-related transcoding hints metadata from the first 
bitstream; 

storing the first motion-related transcoding hints metadata; 

utilizingthe stored first motion-related transcoding hints metadata to extrapolate 
second motion information for a second bitstream of compressed image data 
representing pictures defining a progressive sequence; and 

outputting the second bitstream. 

41. A transcoding method, comprising the steps of: 

receiving a first bitstream of compressed image data representing pictures 
defining a progressive sequence; 
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extracting first motion-related transcoding hints metadata from the first 
bitstream; 

storing the first motion-related transcoding hints metadata; 

utilizingthe stored first motion-related transcoding hints metadata to extrapolate 
second motion information for a second bitstream of compressed image data 
representing pictures defining an interlaced sequence; and 

outputting the second bitstream. 

42. A transcoding method, comprising the steps of: 

receiving a first bitstream of compressed image data representing frame pictures; 
extracting first motion-related transcoding hints metadata from the first 
bitstream; 

storing the first motion-related transcoding hints metadata; 

utilizing the stored first motion-related transcoding hints metadata to extrapolate 
second motion information for a second bitstream of compressed image data 
representing field pictures; and 

outputting the second bitstream. 

43. A transcoding method, comprising the steps of: 

receiving a first bitstream of compressed image data representing field pictures; 
extracting first motion-related transcoding hints metadata from the first 
bitstream; 

storing the first motion-related transcoding hints metadata; 



WO 01/69936 



PCT/JPO 1/01982 



46 

utilizingthe stored first motion-related transcoding hints metadata to extrapolate 
second motion information for a second bitstream of compressed image data 
representing frame pictures; and 

outputting the second bitstream. 

44. A transcoding method, comprising the steps of; 

receiving a first bitstream of compressed image data representing a main image; 
extracting first motion-related transcoding hints metadata from the first 
bitstream; 

storing the first motion-related transcoding hints metadata; 

utilizingthe stored first motion-related transcoding hints metadata to extrapolate 
second motion information for a second bitstream of compressed image data 
representing a portion of the main image; and 

outputting the second bitstream. 

45. A transcoding method, comprising the steps of: 

receiving a first bitstream of compressed image data having a plurality of coding 
parameters including at least one of a GOP structure, a picture size, a bit rate, a frame 
picture format, a field picture format, a progressive sequence, and an interlaced 
sequence; 

extracting first mot ion -related transcoding hints metadata from the first 
bitstream; 

storing the first motion-related transcoding hints metadata; 
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utilizing the stored first motion-related transcoding hints metadata to extrapolate 
second motion information for a second bitstream of compressed image data having 
a plurality of coding parameters such that one or more of the coding parameters of the 
second bitstream are different from the coding parameters of the first bitstream; and 
outputting the second bitstream. 

46. A transcoding method comprising the steps of: 

receiving a first bitstream of compressed image data representing pictures of a 
first size; 

extracting first feature point motion-related transcoding hints metadata from the 
first bitstream; 

storing the first feature point motion-related transcoding hints metadata; 

utilizing the stored first feature point motion-related transcoding hints metadata 
to extrapolate second motion information for a second bitstream of compressed image 
data representing pictures of a second size different from the first size; and 

outputting the second bitstream. 

47. A transcoding method, comprising the steps of: 

receiving a first bitstream of compressed image data representing pictures 
defining an interlaced sequence; 

extracting first feature point motion-related transcoding hints metadata from the 
first bitstream; 

storing the first feature point motion-related transcoding hints metadata; 
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utilizing the stored first feature point motion-related transcoding hints metadata 
to extrapolate second motion information for a second bitstream of compressed image 
data representing pictures defining a progressive sequence; and 

outputting the second bitstream. 

48. A transcoding method, comprising the steps of: 

receiving a first bitstream of compressed image data representing pictures 
defining a progressive sequence; 

extracting first feature point motion-related transcoding hints metadata from the 
first bitstream; 

storing the first feature point motion-related transcoding hints metadata; 

utilizing the stored first feature point motion-related transcoding hints metadata 
to extrapolate second motion information for a second bitstream of compressed image 
data representing pictures defining an interlaced sequence; and 

outputting the second bitstream; 

49. A transcoding method, comprising the steps of: 

receiving a first bitstream of compressed image data representing frame pictures; 
extracting first feature point motion-related transcoding hints metadata from the 
first bitstream; 

storing the first feature point motion-related transcoding hints metadata; 
utilizing the stored first feature point motion-related transcoding hints metadata 
to extrapolate second motion information for a second bitstream of compressed image 
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data representing field pictures; and 

outputting the second bitstream. 

50. A transcoding method, comprising the steps of: 

receiving a first bitstream of compressed image data representing field pictures; 
extracting first feature point motion -related transcoding hints metadata from the 
first bitstream; 

storing the first feature point motion-related transcoding hints metadata; 

utilizing the stored first feature point motion-related transcoding hints metadata 
to extrapolate second motion information for a second bitstream of compressed image 
data representing frame pictures; and 

outputting the second bitstream. 

51. A transcoding method, comprising the steps of: 

receiving a first bitstream of compressed image data representing a main image; 
extracting first feature point motion-related transcoding hints metadata from the first 
bitstream; 

storing the first feature point motion-related transcoding hints metadata; 

utilizing the stored first feature point motion-related transcoding hints metadata 
to extrapolate second motion information for a second bitstream of compressed image 
data representing a portion of the main image; and 

outputting the second bitstream. 

52. A transcoding method, comprising the steps of: 
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receiving a first bitstream of compressed image data having a plurality of coding 
parameters including at least one of a GOP structure, a picture size, a bit rate, a frame 
picture format, a field picture format, a progressive sequence, and an interlaced 
sequence; 

extracting first feature point motion-related transcoding hints metadata from the 
first bitstream; 

storing the first feature point motion-related transcoding hints metadata; 

utilizing the stored first feature point motion-related transcoding hints metadata 
to extrapolate second motion information for a second bitstream of compressed image 
data having a plurality of coding parameters such that one or more of the coding 
parameters of the second bitstream are different from the coding parameters of the first 
bitstream; and 

outputting the second bitstream. 

53. A video processing method for processing supplied video signals, comprising 
the steps of: 

receiving a source video; and 

classifying contents of the source video using one of motion metadata, 
texture/edge metadata, and feature points and associated motion metadata, including 
a number of new feature points per frame. 

54. A video processing method according to claim 53, wherein said method is used 
for determining transcoding parameters settings of a transcode. 
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55. A video processing method according to claim 53, wherein said method is used 
for organizing audiovisual material based on the classification of the contents of the 
source video. 

56. An apparatus for processing supplied video/audio signals, comprising: 

a target buffer for storing at least one description of transcoding target bitstream 
parameters; 

an extraction unit for extracting transcoding hints metadata based on the at least 
one description; 

a buffer for storing the transcoding hints metadata; 

a segmenting unit for separating A/V material into segments; and 

a transcoding unit for associating the transcoding hints metadata to the separated 
A/V segments and transcoding the A/V material. 

57. A transcoding apparatus, comprising: 

an input for receiving a first bitstream of compressed image data representing 
pictures of a first size; 

a transcoding hints metadata extraction unit for extracting transcoding hints 
metadata from the first bitstream; 

a buffer for storing the transcoding hints metadata; 

a processing unit for utilizing the stored transcoding hints metadata to 
extrapolate motion information for a second bitstream of compressed image data 
different from the first bitstream; and 
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an output for outputting the second bitstream. 
58. An apparatus for processing supplied video signals, comprising: 
an input for receiving a source video; and 

a processor for classifying contents of the source video using one of motion 
metadata, texture/edge metadata, and feature points and associated motion metadata, 
including a number of new feature points per frame. 
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1: bitrate <int> 

2: size_of_pictures <2*int> 

3: number_of_frames_per_second <int> 

4: pel_aspect_ratio <float> 

5: pel_colour_depth <int> 

6: usage_of_progressive_interlaced_format <1 bit> 
7: usage_of_frame_field_pictures <1bit> 
8: compression method <int> 

9: one out of list {MPEG-1, MPEG-2, MPEG-4, DV, H.263, H.261, } 

10: { further parameters for compression method } 
1 1 : GOP_structure (Runiength coding) 
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1 : bitrate <int> 

2: size_of .pictures <2*int> 

3: number_of_frames_per_second <int> 

4: pel_aspect_ratio <float> 

5: pel_colour_depth <lnt> 

6: usage__of_progressive_interlaced_format <1 bit> 
7: usage_of_frameJield_pictures <1bit> 
8: compression_method <int> 



9: one_out_of_list {MPEG-1, MPEG-2, MPEG-4, DV, H.263, H.261 } 

10: { further parameters for compression method } 
1 1 : GOP_structure (Runlength coding) 
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1: use_region_of_interest_DS: <1bit> 

2: region_of_interest DS: 

3: shape_D: select one or {boundary_box_D, MB_shape_D, shape_D} 

4: motion_trajectory_D 

5: 

6: use_editing_effects_transcoding_hints_DS: <1bit> 

7: camera.flash {framel, frame2, framek} <k*int> 

8: cross_fading {(start_frame, end_frame), . . . } <k*(<int>, <int>)> 

9: black_pictures {(start_frame, end_frame), . . . <k*(<int>, <int>)> 

10: fadejn {(start_frame, end.frame), ...} <k*(<int>, <int>)> 

11 : fade_out {(start__frame, end_frame), . . . } <k*(<int>, <int>)> 

12: abrupt_change {framel , frame2, framek} <k*int> 

13: 

14: use_motion_transcoding_hints_DS: <1 bit> 

15: number_of_regions: <int> 

1 6: f or_every_region : 

17: is_region_rectangular_shaped (y/n) : <1bit> 

18: if .arbitrarily shaped: use region D for this region 

19: describe parametric object motion for this region 
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1: start_frame <int> 
2: nframes <int> 
3: l_frame_location: 



4: select_one_out_of_the__followlng: <2 bit> 
5: first frame (default) 

6: list of frames {frame1 9 frame2, , framek} <k*int> 

7: first_frame_and_every_k_frames <int> 

8: no_l_frame 



9: quantizer_scale <int> 

10: target_bitrate <int> 

11: target_min_bitrate <int> 

12: target_max_bitrate <int> 

13: use_transcoding_states (y/n) <1 bit> 

14: transcoding_state_nr <int> 

15: add_new_transcoding_state (y/n) <1bit> 



16: if yes: {list of parameters} 

17: remove_transcoding_state (y/n) <1bit> 

18: if yes: state_nr <int> 

19: use_encoding_complexity_description (y/n) <1 bit> 

20: if yes: encoding_complexity_description_scheme 
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1: use_feature_points (y/n) <1bit> 

2: select_feature_point__method <2 bits> 

3: number_of_new_feature_points <nframes * int> 

4: feature_point_metrIcs {mean, max, min, var, stddev} <5* int> 

5: use_equation_description (y/n) <1bit> 

6: use_motion_descriptton (y/n) <1bit> 

7: select_motion_method <4 bits> 

8: param_k_motion <nf rames * k * int> 

9: motion.metrics {min, max, sum, var, stddev} <5*int> 

1 0: block_motion_f ield < nf rames*int*size_x*size_y / (m*m) > 

11: use_texture_edge_metrics (y/n) <1bit> 

12: select_texture_edge_metrics <4 bits> 

1 3: DCT_block_energy <size_y*size_x*nf rames'i nt/64> 

1 4: DCT_block_activity <size_y*size_x*nf rames*int/64> 

15: DCT_energy_metric {mean, min, max, sum, var, stddev} <6*int> 

16: DCT_activity_metric {mean, min, max, sum, var, stddev} <6*int> 
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1 : M: VP distance <int> 

2: bitrate_fraction_for_l <f loat> 

3: bitrate_f raction_for_P <f loat>/* bitrate_f raction of B is rest to 1 00 %) 

4: quantizer_scale_ratio_LP <float> 

5: quantizer_scale_ratlo_l_B <float> 

6: iff_f rame: /* see target format transcoding hints 7 



7: X J, X_P, X_B <3*int> r frame_vbv_complexities V 
8: if_field: 

9: XJ_top, X_P_top, X_B_top <3*int> /* field_top_vbv_complexities V 
10: XJ_bot, X_P_bot, X_B_bot <3*int> /* field_bottom_vbv_complexities V 
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